Eliza Googles For Knowledge

نویسنده

  • Jay Lofstead
چکیده

The Eliza [1] program relies on having a rich backing script to generate responses based on matches with words the user types. If it were to have a script the size and complexity of Google, it might be able to pass as a live human. If the user’s entry contains no matching words in the script, the program could use the resources of Google to generate something new and appropriate. The introduction of the Google API [2] affords this possibility. While the marriage of the two technologies is by no means perfect, it does show promise. This paper describes the creation of the integration, what problems were encountered, and what some potential ideas for extending the integration might be. Introduction The idea of having a database the size and complexity of Google as a resource for a nonsearch application is intriguing. In 2002, Google introduced an API that allows programmatic use of the Google engine through a SOAP [9] interface. The advantages and challenges of having access to a database as varied and rich as Google were now available to everyone. The initial difficulty lies in understanding how to take advantage of this database effectively. The first step is to identify a possible domain where these advantages could be exploited most easily. The first applications built on the API were in two domains. There are search refinements including for structured document retrieval [7] and also several language tools. Some of these tools include help for dyslexics [4], grammatical assistance [5], and language understanding [6]. Artificial intelligence is another possible domain. For many years, it was believed that if an application could just have a large enough data store with the appropriate relationships, it would exhibit a sort of intelligence. Doing the data entry and drawing the relationships is a monumental task. Using a subset of the world, the Web, and an automated indexing tool, Google, it might be possible to simulate that level of intelligence. Given the limited time of this project, a relatively simple application was chosen. The classic Eliza application with its default Rogerian psychotherapist interaction script provides a good first step for investigating the possibilities. Eliza has been around for approaching 40 years affording experiments with its pattern matching language models working for multiple languages [1]. The perceived quality of the program is greatly influenced by the depth and variety of the script. The user’s interactions are richer if more matches are made and interesting responses are generated. However, it still does not understand any of the words typed in. Google similarly does not attempt to understand the context of words to generate a more meaningful response. By combining both tools, the hope was a more rich experience using Eliza through the generation of more varied information generated by a Google query. The rest of the paper will focus on the features of Eliza that make it a good candidate, the features of the Google API that allow it to work in this domain, the problems encountered by selecting Eliza as the candidate, the shortcomings of the Google API for this particular application, integration issues, a discussion of the results, and some ideas about possible future directions for this work. Features of Eliza Eliza works by evaluating the sentence typed in by the user looking for keywords matching those in its associated script file. The highest ranked matching keyword rule is used to generate a response based on the pattern associated with the rule. Similar to Google, this usually works very well for generating a coherent reply to the user’s input. Some of these rules include instructions to “remember” pieces for later prompts when nothing was matched. In these matchless cases, an attempt is made to use a random item from “memory” to draw the user back to a point made earlier. If no memory entries are available, it uses the default null rule. These null rule responses are very generic and are intended to get the user to continue typing without being specific about anything. Instead of always using these null rule responses, it is an ideal time to inject some Google results. While Eliza works well with a fairly rich script file, it is limited by the quality the script file. Unless the file is infinitely large and complex, the program will occasionally have to use the memory and the null rule. Integrating the use of an external knowledge source when the null rule is used can help extend the richness of the interaction while not overly complicating the script file nor the application code. Features of the Google API The Google API provides a programmatic interface to the Google engine and database. Beyond searching, it does support some alternative requests like spelling and retrieving a cached page in an HTML format. The search functionality is by far the richest of the three with many parameters allowing query refinement. Some examples include “safe” searches eliminating any “questionable” sites from the results; natural language restricted searches; document type restrictions; and searching only a subset of the Google database. The API requires a user key that limits use to ten responses per query and 1000 queries per day. Specifying at which result to start the result set can retrieve additional responses beyond the initial ten for a query. The result set is generated in an XML format with discrete result set attributes and a list of results each with its own attributes. Among the result set attributes available are an estimated total results, search time, and search tips. Each result includes a relevant snippet of the matching result, summary information about the document, site information, the document URL, document title, directory category, and others. The directory category is particularly interesting. By examining the category information for special keywords, such as “Business”, the results can be filtered by context to some degree. As handy as this feature is, only pages explicitly categorized by Google have this information. A large majority of pages in the system have not been categorized. Issues with Eliza While the base application is not terribly complex, the time constraints dictated using an existing implementation. The best choice for an implementation would be one that could easily be integrated with the Google API. Given that the example code and libraries provided by Google are written in both Java and C#, one of those languages would be easiest. The best implementation of Eliza in either of these two languages is Charles Hayden’s Java implementation [3]. His Eliza code was written to be both an applet and an AWT application. Unfortunately, the AWT implementation is buggy and incomplete. These limitations encouraged use as an applet. Although it was initially easy to get it to work as an applet, problems started to arise. The Java virtual machine version on the client machine needed to be sufficiently new to handle newly compiled code. Getting the proper version of the plugin installed proved to be much more difficult than would have appeared. Ultimately, in a student account on the campus machines, a new installation of the browser was required in order for the newer Java virtual machine plugin to be recognized. Once those problems were resolved and the base application was working properly, the Google API was introduced. The initial testing generated no Google responses when it should have. Tests using the example Java program that comes with the API download worked perfectly. Several tests later resurfaced the memory that Java applets can only communicate with the server they were downloaded from. This effectively and completely killed the ability to use the integrated application as an applet or an AWT application without significant reworking of the code into a client-server form. As a fallback, a new command-line interface was added. It is much less friendly to use, but it does work. Even once it was working properly with Google integrated, there were additional problems. From an operational standpoint, it would be best if some particular keyword could be generated from the unmatched text. Selecting a noun or verb or noting negation and performing the Google search on that limited text would generate a more targeted result. Since the tool only generates keywords on matches, this desired functionality is beyond the capabilities of the application. When it is appropriate to call Google, it specifically and necessarily has no information except what the user has just typed in. This lack of context and lack of any parsing of the provided text makes it difficult to perform a limited, targeted query of Google. Issues with the Google API While the API generates the expected result set, some of the features present on the web page are absent in the API results. For example, the results from the news search are not present nor is it possible to search the Google groups. These limitations greatly reduce the usefulness of the application for this particular use. The results returned are not without their problems either. The snippet provides a short, contextual view of the page and is encoded in HTML with bold tags marking the keywords found, a break tag for inserting a forced line break, and encoded special characters. The tags are useful for a standard Google-style display, but interfere with the text-only response desired for Eliza. Decoding the special characters would introduce a very large amount of special code, some of which may generate unprintable text mode characters. Further complicating matters, the results include both commercial and non-commercial sites. The commercial sites may not always be categorized as such in their directory category attribute making filtering them difficult at best. When a site has a purely advertising focus, the snippet will likely reflect that marketing language. Identifying these sites is not always possible. Integration Issues The author’s initial understanding of Eliza was attributing more functionality than there really was. The expectation was that nouns and verbs might be identified to better match what the user typed for generating responses. The very simplistic pattern matching approach does work well for the intended purpose, but does not lend itself to easy extension with a tool like the Google API. Instead of choosing the words most appropriate and limiting the search to just those terms, the entire sentence typed by the user must be sent to Google. The double edged sword of searching on a complete sentence is that it can provide a pseudo-context by looking for pages that match all of the words, but it also ignores the sense of the key word or words diluting the results. If the user input contains two popular words in a non-related or contradictory way, results will be returned that match both without regard to the lack of connection or negative connection. While many of the results can work as part of a response, figuring out how to express it in a way that makes logical sense is not always possible. For example, if the snippet returned included two separate sections of the document where the context is likely to be at least slightly different, phrasing the snippet in a coherent way may not be possible. Trying to develop a way of phrasing the responses so that they make even some sense most of the time is very difficult. To improve the results, evaluation of the entire page identified by Google would be necessary. Google does this already in the generation of the snippet. It searches the document highlighting the keywords and provides them in a fragment of context. Since Google indexes documents of various formats, this is especially helpful in the case of non-text documents. While retrieving the cached copy of a document does return an HTML version, it would still require parsing to find the relevant word or words and to generate some context. This all but forces the use of the cached copy for richer responses and also forces parsing of HTML to generate any richer response than the snippet provided. Knowing how to parse the returned document to identify a matching context and a relevant snippet is considerably beyond the scope of this work. The last issue was how often to call Google and which result to use. Initially the calls to Google completely replaced the use of the null rule and always selected the first response. The results were less than satisfactory. It was changed to only use Google half of the time and chooses a random result from the list returned. If no results are returned, it falls back to the null rule. This generated different enough responses on identical input to keep the interaction more natural and interesting.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

ELIZA, but cleverer: Designing persuasive artificial agents

This extended abstract introduces research into persuasive dialogue modelling which is to be carried out as part of PIPS,1 one of the five Integrated Projects funded from the first call of the EU 6th Framework Programme in the field of e-Health. One of PIPS’ aims is to design and build a personal computational advisory system to support European Citizens in taking informed health-related decisi...

متن کامل

Automatic Generation of Appropriate Greeting Sentences using Association System

When we humans start a conversation, we are greeting at first. If computer and robot are greeting like us, they can communicate smoothly with us because the next subject comes easily after greeting. That is to say, greeting conversation plays an important part to smooth communications in speaking. In this report, we describe a method of increase the number of appropriate greeting sentences for ...

متن کامل

A Survey of Chabot Systems through a Loebner Prize Competition

Starting in 1966 with the introduction of the ELIZA chatbot, a great deal of effort has been devoted towards the goal of developing a chatbot system that would be able to pass the Turing Test. These efforts have resulted in the creation of a variety of technologies and have taken a variety of approaches. In this paper we compare and discuss the different technologies used in the chatbots which ...

متن کامل

Semantics and Complexity of Question Answering Systems: Towards a Moore's Law for Natural Language Engineering

1. There’s a proliferation of QA systems and NL chat systems on the Web and in intranets (e.g. www.neuromedia.com or www.ask.com). The question arises: how far are these systems from "real" NLU? After all, they seem to follow Eliza. Do they? How do we settle such questions? 2. QA systems typically comprise of an NL understanding systems and a knowledge base/database. Given a user’s query, what ...

متن کامل

Chatterbox Challenge 2005: Geography of the Modern Eliza

The geography of a modern Eliza provides an illusion of natural language understanding. However, this is seen in very few of the hundred-plus programmes entered into Chatterbox Challenge 2005 (CBC 2005), a competition for artificial intelligence based on Turing’s measure for intelligence through textual dialogue. The author’s experience as one of the Judges in CBC 2005 has found that though not...

متن کامل

Privacy als Produkt

Abstract: Mit der zunehmenden Bedeutung von sozialen Netzwerken und cloudbasierter Systeme nimmt ”Privacy as a Product” eine immer stärkere Bedeutung im Entwicklungsprozess von Web-Applikationen ein. Dies beeinflusst entscheidend die Akzeptanz, Sicherheit und das Vertrauen in neue Produkte. Der Vortrag geht detailliert auf Googles Ansatz zu ”Privacy as a Product” ein: Privacy Technologien versc...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004